Lecture 1: Review — MGFs, Transformations, Common Distributions
2026-01-13
Welcome to STAT 456: Mathematical Statistics
Today’s objectives — Review key tools from 443 that underpin everything in this course:
Definition (MGF)
The moment generating function of a random variable \(X\) is \[M_X(t) = E[e^{tX}]\] provided this expectation exists for all \(t\) in some interval \((-h, h)\) for \(h > 0\).
The obvious question: Why do we require \(M_X(t)\) to exist for \(t \in (-h, h)\) and not just at \(t = 0\)?
The trivial answer: At \(t=0\), every random variable has \[M_X(0) = E[e^0] = E[1] = 1\] This tells us nothing about \(X\)!
The deeper reason: We need derivatives to recover moments: \[E[X^n] = M_X^{(n)}(0) = \lim_{t \to 0} \frac{d^n}{dt^n} M_X(t)\]
Taking derivatives at a single point requires the function to be analytic (smooth in a neighborhood).
The Philosophical Point
The MGF encodes the entire distribution into a single function. But this requires the distribution’s tails to be “light enough” that \(E[e^{tX}]\) converges.
Example of failure: The Cauchy distribution
\[f(x) = \frac{1}{\pi(1 + x^2)}\]
Discussion: What do heavy-tailed distributions tell us about real-world phenomena? When should we expect MGFs to fail?
Theorem (Generating Moments)
If \(X\) has mgf \(M_X(t)\), then \[E[X^n] = M_X^{(n)}(0) = \left. \frac{d^n}{dt^n} M_X(t) \right|_{t=0}\]
Theorem (Linear Transformation)
For constants \(a, b\): \[M_{aX+b}(t) = e^{bt} M_X(at)\]
The name isn’t arbitrary!
Expand \(M_X(t) = E[e^{tX}]\) using Taylor series: \[M_X(t) = E\left[\sum_{k=0}^\infty \frac{(tX)^k}{k!}\right] = \sum_{k=0}^\infty \frac{t^k}{k!} E[X^k]\]
So: \[M_X(t) = 1 + tE[X] + \frac{t^2}{2!}E[X^2] + \frac{t^3}{3!}E[X^3] + \cdots\]
The moments are the Taylor coefficients!
Take the \(n\)-th derivative and evaluate at \(t=0\): \[M_X^{(n)}(0) = E[X^n]\]
The Insight
The MGF is the generating function for the moment sequence \(\{E[X^n]\}_{n=0}^\infty\).
This connects to:
Theorem (MGF Uniqueness)
Let \(F_X\) and \(F_Y\) be two cdfs whose mgfs exist.
If \(M_X(t) = M_Y(t)\) for all \(t\) in some neighborhood of 0, then \(F_X(u) = F_Y(u)\) for all \(u\).
Key Point: This is why MGFs are so useful for identifying distributions — if two random variables have the same MGF, they have the same distribution!
Why is this theorem revolutionary?
Without uniqueness, to verify two random variables have the same distribution, we’d need to check: \[F_X(u) = F_Y(u) \quad \text{for } \textbf{every} \, u \in \mathbb{R}\]
That’s an infinite number of checks!
With uniqueness, we only need to verify: \[M_X(t) = M_Y(t) \quad \text{for } t \in (-h, h)\]
A single functional equation replaces infinitely many pointwise checks.
Three Powerful Tools
This course will repeatedly use this pattern:
Discussion: What other areas of mathematics leverage “finding the right representation” to simplify problems?
Theorem (Convergence of MGFs)
Suppose \(\{X_i\}\) is a sequence of random variables with mgfs \(M_{X_i}(t)\).
If \(\lim_{i \to \infty} M_{X_i}(t) = M_X(t)\) for all \(t\) in a neighborhood of 0, and \(M_X(t)\) is an mgf, then \[\lim_{i \to \infty} F_{X_i}(x) = F_X(x)\] at all continuity points of \(F_X\).
Application: Key for asymptotics — we’ll use this for MLE asymptotic normality later in the course.
| Distribution | Parameters | MGF \(M_X(t)\) | Constraint |
|---|---|---|---|
| Normal | \(\mu, \sigma^2\) | \(\exp(\mu t + \sigma^2 t^2/2)\) | all \(t\) |
| Gamma | \(\alpha, \beta\) | \((1 - \beta t)^{-\alpha}\) | \(t < 1/\beta\) |
| Chi-squared | \(p\) (df) | \((1-2t)^{-p/2}\) | \(t < 1/2\) |
| Exponential | \(\beta\) | \((1 - \beta t)^{-1}\) | \(t < 1/\beta\) |
| Binomial | \(n, p\) | \((pe^t + 1-p)^n\) | all \(t\) |
| Poisson | \(\lambda\) | \(\exp(\lambda(e^t - 1))\) | all \(t\) |
Univariate case:
If \(Y = g(X)\) with \(g\) strictly monotone and differentiable: \[f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|\]
Multivariate case:
If \(\mathbf{Y} = g(\mathbf{X})\) with \(g: \mathbb{R}^n \to \mathbb{R}^n\) a diffeomorphism: \[f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) \cdot |J|\]
where \(J\) is the Jacobian determinant of \(g^{-1}\).
Important Relationships
Let \(X \sim \text{Gamma}(\alpha, \beta)\). Then: \[M_X(t) = \frac{1}{\Gamma(\alpha)\beta^\alpha} \int_0^\infty e^{tx} x^{\alpha-1} e^{-x/\beta} \, dx\]
Recognize integrand as kernel of Gamma\((\alpha, \beta/(1-\beta t))\): \[M_X(t) = \left( \frac{1}{1 - \beta t} \right)^\alpha, \quad t < \frac{1}{\beta}\]
Finding moments: \[E[X] = M'_X(0) = \alpha\beta\] \[E[X^2] = M''_X(0) = \alpha(\alpha+1)\beta^2\] \[\text{Var}(X) = \alpha\beta^2\]
Setup: \(X \sim \text{Binomial}(n, p)\) with \(\lambda = np\) fixed as \(n \to \infty\)
Binomial MGF: \(M_X(t) = (pe^t + 1-p)^n\)
Poisson MGF: \(M_Y(t) = e^{\lambda(e^t - 1)}\) where \(Y \sim \text{Poisson}(\lambda)\)
Convergence: With \(p = \lambda/n\): \[M_X(t) = \left(1 + \frac{\lambda(e^t - 1)}{n}\right)^n \to e^{\lambda(e^t - 1)} = M_Y(t)\]
By MGF Convergence Theorem: Binomial\((n, \lambda/n) \xrightarrow{d}\) Poisson\((\lambda)\)
Course roadmap:
MGFs → Sampling distributions (next)
↓
Sampling distributions → Sufficiency
↓
Sufficiency → Point estimation (MLE, UMVUE)
↓
Estimation → Hypothesis testing
The unifying theme: Extracting information efficiently
The Philosophy
Each concept involves finding the right representation to make hard problems tractable. MGF uniqueness is just the first example of this powerful pattern.
Next lecture: Sampling from the normal distribution — \(\bar{X}\), \(S^2\), and their independence